feat: inline Arrow IPC support with format fallback#273
Closed
jamesbroadhead wants to merge 7 commits intodatabricks:mainfrom
Closed
feat: inline Arrow IPC support with format fallback#273jamesbroadhead wants to merge 7 commits intodatabricks:mainfrom
jamesbroadhead wants to merge 7 commits intodatabricks:mainfrom
Conversation
Some serverless warehouses only support ARROW_STREAM with INLINE disposition, but the analytics plugin only offered JSON_ARRAY (INLINE) and ARROW_STREAM (EXTERNAL_LINKS). This adds a new "ARROW_STREAM" format option that uses INLINE disposition, making the plugin compatible with these warehouses. Fixes databricks#242
Tests verify: - ARROW_STREAM format passes INLINE disposition + ARROW_STREAM format - ARROW format passes EXTERNAL_LINKS disposition + ARROW_STREAM format - Default JSON format does not pass disposition or format overrides
The server-side ARROW_STREAM format added in the previous commit was not exposed to the frontend or typegen: - Add "ARROW_STREAM" to AnalyticsFormat in appkit-ui hooks - Add "arrow_stream" to DataFormat in chart types - Handle "arrow_stream" in useChartData's resolveFormat() - Make typegen resilient to ARROW_STREAM-only warehouses by retrying DESCRIBE QUERY without format when JSON_ARRAY is rejected Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
…compatibility ARROW_STREAM with INLINE disposition is the only format that works across all warehouse types, including serverless warehouses that reject JSON_ARRAY. Change the default from JSON to ARROW_STREAM throughout: - Server: defaults.ts, analytics plugin request handler - Client: useAnalyticsQuery, UseAnalyticsQueryOptions, useChartData - Tests: update assertions for new default JSON and ARROW formats remain available via explicit format parameter. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
When using the default ARROW_STREAM format, the analytics plugin now automatically falls back through formats if the warehouse rejects one: ARROW_STREAM → JSON → ARROW. This handles warehouses that only support a subset of format/disposition combinations without requiring users to know their warehouse's capabilities. Explicit format requests (JSON, ARROW) are respected without fallback. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Previously, _transformDataArray unconditionally called updateWithArrowStatus for any ARROW_STREAM response, which discards inline data and returns only statement_id + status. This was designed for EXTERNAL_LINKS (where data is fetched separately) but broke INLINE disposition where data is in data_array. Changes: - _transformDataArray now checks for data_array before routing to the EXTERNAL_LINKS path: if data_array is present, it falls through to the standard row-to-object transform. - JSON format now explicitly sends JSON_ARRAY + INLINE rather than relying on connector defaults. This prevents the connector default format from leaking into explicit JSON requests. - Connector defaults reverted to JSON_ARRAY for backward compatibility with classic warehouses (the analytics plugin sets formats explicitly). - Added connector-level tests for _transformDataArray covering ARROW_STREAM + INLINE, ARROW_STREAM + EXTERNAL_LINKS, and JSON_ARRAY paths. Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
Some serverless warehouses return ARROW_STREAM + INLINE results as base64 Arrow IPC in `result.attachment` rather than `result.data_array`. This adds server-side decoding using apache-arrow's tableFromIPC to convert the attachment into row objects, producing the same response shape as JSON_ARRAY regardless of warehouse backend. This abstracts a Databricks internal implementation detail (different warehouses returning different response formats) so app developers get a consistent `type: "result"` response with named row objects. Changes: - Add apache-arrow@21.1.0 as a server dependency (already used client-side) - _transformDataArray detects `attachment` field and decodes via tableFromIPC - Connector tests use real base64 Arrow IPC captured from a live serverless warehouse, covering: classic JSON_ARRAY, classic EXTERNAL_LINKS, serverless INLINE attachment, data_array fallback, and edge cases Co-authored-by: Isaac Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
6 tasks
Contributor
Author
|
Closing in favour of #256, which addresses Mario's review feedback (simplified to 2 formats matching API enums: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for serverless SQL warehouses that return inline Arrow IPC data in
result.attachmentinstead ofdata_array, plus automatic format fallback for warehouse compatibility._transformArrowAttachmentdecodes base64 Arrow IPC fromresult.attachmentinto row objects viatableFromIPCChanges
packages/appkit/src/connectors/sql-warehouse/client.ts— Arrow IPC attachment detection and decodingpackages/appkit/src/plugins/analytics/analytics.ts—_executeWithFormatFallbackwith ARROW_STREAM → JSON → ARROW chainpackages/appkit-ui/src/react/hooks/types.ts— ARROW_STREAM format type supportpackages/appkit/src/type-generator/query-registry.ts— typegen compatibilityTest plan
pnpm test)Related
This pull request was AI-assisted by Isaac.